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1 1 . (Original) A method for recovering from failures affecting a resource manager within a 

2 group Of resource managers, wherein the resource managers within the group have access to a 

3 shared resource via which remote resource managers communicate with the resource managers 

4 within the group, the shared resource including data storage structures to which resource 

5 managers within said group connect Lo scud and receive communications, the method 
fi comprising: 

7 storing, within a first data storage structure of the shared resource, unit of work 

8 descriptors for operations performed in relation to said shared resource by the resource managers 

9 in said group; 

1 0 sending a notification of a connection failure between a second data storage structure of 

1 1 the shared resource and a first resource manager within said group, the notification being sent to 

1 2 the remaining resource managers within the group which are connected to the second data 

1 3 storage structure; 

1 4 one or more of said remaining resource managers accessing said first data storage 

1 5 structure and analysing the unit of work descri ptors lo identify the units of work relating to the 

1 6 second data storage structure that were being performed by the first resource manager when the 

1 7 connection failure occurred; and 

1 8 said on ft or more remaining resource managers recovering the identified units of work. 

1 2, (Original) A method according to claim 1 wherein, if there are no remaining resource 

2 managers connected to the second data storage structure afW said connection failure, said 

-2- 
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y 3 notification is sent to a remaining resource manager when that resource manager connects to the 

4 second data storage structure. 

1 3.(Original) A method according Lu claim 1 wherein, if there are no remaining resource • 

2 managers connected to the second data storage structure after said connection, failure, the failed 

3 resource manager determines when it is restarted whether any other resource manager has 

4 performed recovery for its units of work relating to die second data storage structure and, upon 

5 determining that no resource manager has performed said recovery, the restarted resource 

6 manager recovers said units of work. 

1 1 . (Original) A method according to claim 1 , wherein all remaining resource managers within 

2 the group which are connected to the second data storage slructure respond to said notification by 

3 attempting to access said first data storage structure to identify units of work to recover, and the 

4 method includes the further steps of: 

5 responsive to a first remaining resource manager identifying a unit of work to recover, 

6 said first remaining resource manager attempting to 3et a flag for said unit of work; 

7 responsive to successfuly setting said flag, assigning recovery responsibility for said unit 

8 of work to said first remaining resource manager; and 

9 refusing to assign recovery responsibility for said unit of work to said first remaining 

1 0 resource manager if said flag has been set by another remaining resource manager. 

1 5. (Original) A method according to claim 4, including the further step of: 
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2 responsive to said flag having been set by another remaining resource manager, said first 

3 remaining resource manager attempting to identify a further unit of work to recover and 

4 attempting to set a flag for said identified further unit of work. 

1 6.(Original) A method according to claim 4, including the following steps in response to a 

2 connection failure between the second data storage structure of the shared resource and said first 

3 remaining resource manager during recovery of said unit of work: 

4 sending a notification of said connection failure to the remaining resource managers 

5 within the group which are connected to the second data storage structure; 

6 one or more of said remaining resource managers accessing said first data storage 

7 structure and analysing the unit of work descriptors to identify the units of work relating to the 

8 second data storage structure that were being ner formed bv me &st remaining resource manager 

9 when the connection failure occurred; and 

1 o said one or more remaining resource managers recovering the identified units of work 

1 7. (Original) A method according to claim 1 , wherein the unit of work descriptors include: 

2 a unit of work identifier; 

3 an identification of messages put or retrieved within the unit of work; 

4 a status for the unit of work; and 

5 a scqucnoe number. 

-4- 
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1 8. (Original) A method according to claim 1, wherein the shared resource is a coupling facility 

2 list structure, the second data storage structure is a coupling facility list structure in which a 

3 coupling facility list header represents a shared access message queue, and the first data Storage 

4 structure is an administration list structure of the coupling facility for storing unit of work 

5 descriptors. 

1 9. (Original) A method according to claim 8, including storing within the coupling facility, for 

2 each resource manager within the group, a list header information map representing the set of 

3 shared access message queues within the second data storage structure for which the resource 

4 manager has performed some work. 

1 10. (Original) A method according to claim % including reading said list header information 

2 map during recovery to identify the set of shared access message queues within the second data 
' 3 storage structure for which the failed resource manager has performed some work. 

1 11. (Original) A method according to claim 1, including storing within the shared resource a 

2 structure interest map identifying the set of data storage structures to which respective resource 

3 managers within said group are connected. 

1 12. (Origi nal) A method according to claim 1 1 , wherein the step of recovering the identified 

2 units of work is a first recovery phase and wherein the method includes a second recovery phase 

3 comprising the steps of: 

4 reading the structure interest map for the failed resource manager to identify the set of 

5 data storage structures to which the failed resource manager was connected at the time of said 

6 connection failure; 

-5- 
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identifying any operations performed by the failed resource manager on said set of data 
storage structures which were not recovered in the first recovery phase; and 

one or more uf said remaining resource managers then backing out saiH unrecovered 

operations. 



13. (Original) A method according to claim 12, wherein the method includes setting a key for 
operations performed in relation to the shared resource, the key identifying the resource manager 
which performed the operation, and wherein the identification of operations performed by the 
failed resource manager comprises checking said keys for unrecovered operations performed in 



5 relation, to any of said set of data storage structures. 

1 14. (Original) A method according to claim 1 * wherein a single unit of work represented by a 

2 unit of work descriptor may include operations performed in relation to a plurality of data storage 

3 structures, and wherein the partial units of work corresponding to said operations are recovered 

4 by different ones of said remaining resource managers within the group. 

1 15. (Original) A method for recovering from failures affecting a resource manager within a 

2 group of resource managers, wherein the resource managers within the group have access to a 

3 shared resource, the shared resource including data slurage structures to which resource managers 

4 within said group connect to perform operations in relation to data held in said shared resource, 

5 the method comprising: 

6 storing, within a first data storage structure of the shared resource, unit of work 

7 descriptors for operations performed by the resource managers in said group in relation to data 

8 held in said shared resource; 



-e- 
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k g sending a notification of a connection failure between a second data storage structure of 

10 the shared resource and a first resource manager within said group, the notification being sent to 

1 1 the remaining resource managers within the group which are connected to the second data 

12 storage structure; 

1 3 one or more of said remaining resource managers accessing said first data storage 

14 structure and analysing the unit of work descriptors to identify the units of work relating to the 

1 5 second data storage structure that were being performed by the first resource manager when the 

1 6 connection failure occurred: and 

-j 7 said one or more remaining resource managers recovering the identified units of work. 

1 16. (Original) A method according to claim 15, wherein the data storage structures of said 

2 shared resource include data storage structures which contain shared message queues and said 

3 operations performed in relation to said shared resource include putting messages onto a shared 

4 message queue and retrieving messages from a shared message queue, for communication 

5 between a remote resource manager and resource managers within said group. 

1 1 7. (Original) A method according to claim 16, wherein the unit of work descriptors include: 

2 a unit of work identifier, 

3 an identification of messages put or retrieved within the unit of work; 

4 a status for the unit of work; and 

5 a sequence number. 

-7- 
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1 18. (Original) A method according to claim 1 6, wherein the operations of putting messages onto 

2 a shared queue and retrieving messages from a shared queue are performed under transactional 

3 scupe such that a message which is put is only available to resource managers other that the 

4 resource manager putting the message after coirumtment of the put operation and a message 

5 which is retrieved is only available to the retrieving resource manager after commitment of the 

6 retrieval operation, and wherein said stored unit of work descriptors identify each of the 

7 following: 

8 units of work that were uncommitted but for which a decision to commit had been made 

9 when the failure occurred: 

1 0 un its of work that were uncommitted but for which a decision to abort had been made 

1 1 when the failure occurred; and 

1 2 units of work for which no commit or abort decision had been made when the failure 

13 occurred; 

1 4 and wherein recovering the identified units of work compri ses: 

1 5 committing message put and retrieval operations for which a decision to commit had been 

16 made; 

1 7 backing out message put and retrieval operations for which a decision to back out had 
1R been made; and 

1 9 backing out message put and message retrieval operations for which no commit or abort 

20 decision had been made. 

-8- 
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19. (Original) A distributed data processing system including: 



2 a plurality of resource managers; 

3 a shared access resource including data storage structures to which the resource managers 

4 connect to send and receive wmmunications to and from remote resource managers, the shared 

5 access resource including: 

6 means for storing, within a first data storage structure of the shared resource, unit of work 

7 descriptors for operations performed in relation to said shared resource by the resource managers 

8 in said plurality; and 

9 means for sending a notifi cation of a connection failure between a second data storage structure 

10 of the shared resource and a first resource manager within said plurality, the notification being 

1 1 sent to the remaining resource managers within the plurality which 

12 arc connected to the 3ccond data storage structure; 

1 3 wherein said remaining resource managers include: 

14 means for accessing said first data storage structure and analysing the unit of work 

1 5 descriptors to identify the units of work relating to the second data storage structure that were 

1 6 being performed by the first resource manager when the connection failure occurred; and 

1 7 means for recovering the identified units of work. 

1 20. (Original) A computer program product comprising program code recorded on a mochine- 

2 readable recording medium, the program code comprising the following set of components: 



- 9 - 
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' 3 a plurality of resource managers; 

4 a shared access resource manager including program code for managing Storage and 

5 retrieval of data within data storage structures to which the resource managers connect to send 

6 and receive communications to and from remote resource managers, the shared access resource 

7 manager including: 

8 means for storing, within a first data storage structure of the shared resource, unit of work 

9 descriptors for operations performed in relation to said shared resource by the resource managers 

1 0 m said plurality: and 

1 1 means for sending a notification of a connection failure between a second daia storage 

1 2 structure of the shared resource and a first resource manager within said plurality, the notification 

1 3 being sent to the remaining resource managers within the plurality which are connected to the 

1 4 second data storage structure; 

1 5 wherein said remaining resource managers include : 

1 6 means for accessing said first data storage structure and analysing the unit of work 

1 7 descriptors to identify the uuits of work relating to the accond data storage structure that were 

1 8 • being performed by the first resource manager when the connection failure occurred; and 

1 g means for recovering the identified units of work. 
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